Construction of a Japanese Relevance-tagged Corpus

نویسندگان

  • Daisuke Kawahara
  • Sadao Kurohashi
  • Kôiti Hasida
چکیده

This paper describes our corpus annotation project. The annotated corpus has relevance tags which consist of predicate-argument relations, relations between nouns, and coreferences. To construct this relevance-tagged corpus, we investigated a large corpus and established the specification of the annotation. This paper shows the specification and difficult tagging problems which have emerged through the annotation so far.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task

This paper reports the details of a Japanese word sense tagged corpus developed as an evaluation data for SENSEVAL-2 Japanese dictionary task. The corpus made up of 2,130 newspaper articles. Not all but only 10,000 words in the articles were manually annotated with sense IDs, which was used as a gold standard data. Word senses were deÞned according to the Iwanami Kokugo Jiten, a Japanese dictio...

متن کامل

Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames

This paper proposes a wide-range anaphora resolution system toward text understanding. This system resolves zero, direct and indirect anaphors in Japanese texts by integrating two sorts of linguistic resources: a hand-annotated corpus with various relations and automatically constructed case frames. The corpus has relevance tags which consist of predicate-argument relations, relations between n...

متن کامل

基於非監督式詞義消歧之日語旅遊意見詞翻譯 (Japanese Opinion Word Translation Based on Unsupervised Word Sense Disambiguation in the Travel Domain) [In Chinese]

This paper proposes a Japanese opinion word translation method based on unsupervised word sense disambiguation. The method comprises the corpus preparation, opinion word dictionary construction, and weighting method. Different from the machine translation, our method does not need parallel corpora, tagged corpora or parsing tree banks. Our method is low-cost but effective, and requires a well-m...

متن کامل

Bond, Francis, Timothy Baldwin, Richard Fothergill and Kiyotaka Uchimoto (2012) Japanese SemCor: A Sense-tagged Corpus of Japanese, In Proceedings of the 6th International Global Wordnet Conference (GWC 2012), Matsue, Japan

In this paper we describe the creation of the Japanese SemCor (JSEMCOR) sensetagged corpus of Japanese. The corpus is a translation of the English SEMCOR, with senses projected across from English. The final corpus consists of 14,169 sentences with 150,555 content words of which 58,265 are sense tagged. The corpus is one of the corpora used to provide sense frequency data for the Japanese Wordnet.

متن کامل

Developing Parallel Sense-tagged Corpora with Wordnets

Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002